🔎 How to download & run the codes?

All the source codes of the aggregation methods are available here . To run the codes, you can clone the repository directly or simply load the R script source file from the repository using devtools package in Rstudio as follow:

  1. Install devtools package using command:

    install.packages("devtools")

  2. Loading the source codes from GitHub repository using source_url function by:

    devtools::source_url("https://raw.githubusercontent.com/hassothea/AggregationMethods/main/KernelAggReg.R")


✎ Note: All codes contained in this Rmarkdown are built with recent version of (version \(>\) 4.1, available here) and Rstudio (version > 2022.02.2+485, available here). Note also that the code chucks are hidden by default.

To see the codes, you can:


1 KFC procedure & important packages

1.1 KFC procedure

KFC procedure is a three-step methodology which puts together clustering and consensual aggregation methods to construct predictions in supervised learning problems. The three steps of the procedure are:

  • Step K : \(K\)-means clustering algorithm is implemented on the input data using several options of Bregman divergences \(({\cal B}_j)_{j=1}^M\) (\(M\) is the number of total divergences used), therefore, the input data is partitioned into many different structures, according to the Bregman divergence used.
  • Step F : For a partition structure given by a divergence \({\cal B}_j\), we fit simple models (linear, for example) on all the clusters of the obtained partition. Then, the collection \({\cal M}_j=\{{\cal M}_{j,k}\}_{k=1}^K\) of these local models is called candidate model, corresponding to the Bregman divergence \({\cal B}_j\). At the end of this step, several candidate models are constructed.
  • Step C : This step aggregates the obtained candidate models using consensual aggregation methods studied in Has (2021) or Fischer and Mougeot (2019).

Remark.1: The prediction of any observation \(x\) given by a candidate model \({\cal M}_j\) is done in two simple steps:

  1. \(x\) is classified into one of the obtained clusters using the corresponding divergence \({\cal B}_j\), i.e., \[x\in{\cal C}_{k^*} \Leftrightarrow {\cal B}_j(c_{k^*},x)=\inf_{1\leq k\leq K}{\cal B}_j(c_k,x)\] where \(\{c_1,...,c_K\}_{k=1}^K\) are the centroids of the corresponding clusters \(\{C_1,...,C_K\}\).

  2. The prediction of \(x\) is given by the corresponding local model \({\cal M}_{j,k^*}\) defined on cluster \(k^*\), i.e., \({\cal M}_j(x)={\cal M}_{j,k^*}(x)\).


The figure above represents the summary of KFC procedure

1.2 Important packages

We prepare all the necessary tools for this Rmarkdown. pacman package allows us to load (if exists) or install (if does not exist) any available packages from The Comprehensive R Archive Network (CRAN) of .

# Check if package "pacman" is already installed 

lookup_packages <- installed.packages()[,1]
if(!("pacman" %in% lookup_packages))
  install.packages("pacman")


# To be installed or loaded
pacman::p_load(magrittr)
pacman::p_load(tidyverse)

## package for "generateMachines"
pacman::p_load(tree)
pacman::p_load(glmnet)
pacman::p_load(randomForest)
pacman::p_load(FNN)
pacman::p_load(xgboost)
pacman::p_load(keras)
pacman::p_load(pracma)
pacman::p_load(latex2exp)
pacman::p_load(plotly)
pacman::p_load(parallel)
pacman::p_load(foreach)
pacman::p_load(doParallel)
rm(lookup_packages)

2 Bregman divergences (BD)

Definition Let \(\phi:\mathcal{C}\rightarrow\mathbb{R}\) be a strictly convex and continuously differentiable function defined on a measurable convex subset \(\mathcal{C}\subset\mathbb{R}^d\). Let \(int(\mathcal{C})\) denote its relative interior. A Bregman divergence indexed by \(\phi\) is a dissimilarity measure \(d_{\phi}:\mathcal{C}\times int(\mathcal{C})\rightarrow\mathbb{R}\) defined for any pair \((x,y)\in \mathcal{C}\times int(\mathcal{C})\) by, \[\begin{equation} \label{eq:1.10} d_{\phi}(x,y)=\phi(x)-\phi(y)-\langle x-y,\nabla\phi(y)\rangle \end{equation}\] where \(\nabla\phi(y)\) denotes the gradient of \(\phi\) computed at a point \(y\in int(\mathcal{C})\). A Bregman divergence is not necessarily a metric as it may not be symmetric and the triangular inequality might not be satisfied.

This section defines all the Bregman divergences used. The list of all the Bregman divergences is given in the table below:


Name \(\phi\) \(d_{\phi}\) \(\cal C\)
Euclidean \({\|x\|_2^2}=\sum_{i=1}^dx_i^2\) \(\|x-y\|_2^2\) \(\mathbb{R}^d\)
General Kullback-Leibler \(\sum_{i=1}^d x_i\ln( x_i)\) \(\sum_{i=1}^d( x_i\ln(\frac{ x_i}{y_i})-(x_i-y_i))\) \((0,+\infty)^d\)
Logistic \(\sum_{i=1}^d(x_i\ln( x_i)+(1- x_i)\ln(1- x_i))\) \(\sum_{i=1}^d\Big( x_i\ln(\frac{x_i}{y_i})+(1- x_i)\ln(\frac{1- x_i}{1-y_i})\Big)\) \((0,1)^d\)
Itakura-Saito \(-\sum_{i=1}^d\ln( x_i)\) \(\sum_{i=1}^d\Big(\frac{ x_i}{y_i}-\ln(\frac{ x_i}{y_i})-1\Big)\) \((0,+\infty)^d\)
Exponential \(\sum_{i=1}^de^{x_i}\) \(\sum_{i=1}^d(e^{x_i}-e^{y_i}-e^{y_i}(x_i-y_i))\) \(\mathbb{R}^d\)
Polynomial \(\sum_{i=1}^d|x|^p,p>2\) \(\sum_{k=1}^d(|x_k|^p-|y_k|^p-\text{sign}(y_k)^pp(x_k-y_k)y_k^{p-1})\) \(\mathbb{R}^d\)

2.1 Look-up list of Bregman divergences

The codes below provide a look-up list of all the BDs defined in the table above.

euclidDiv <- function(X., y., deg = NULL){
    res <- sweep(X., 2, y.)
    return(rowSums(res^2))
}
gklDiv <- function(X., y., deg = NULL){
  res <- c("/", "-") %>%
    map(.f = ~ sweep(X., 2, y., FUN = .x))
  return(rowSums(X.*log(res[[1]]) - res[[2]]))
}
logDiv <- function(X., y., deg = NULL){
    res <-  map2(.x = list(X., 1-X.),
                 .y = list(y., 1-y.),
                 .f = ~ sweep(.x, 2, .y, FUN = "/"))
    return(rowSums(X.*log(res[[1]])+(1-X.)*log(res[[2]])))
}
itaDiv <- function(X., y., deg = NULL){
    res <- sweep(X., 2, y., FUN = "/")
    return(rowSums(res-log(res) - 1))
}
expDiv <- function(X., y., deg = NULL){
    exp_y <- exp(y.)
    res <- sweep(1+X., 2, y.) %>%
      sweep(2, exp_y, FUN = "*")
    return(rowSums(exp(X.)-res))
}
polyDiv <- function(X., y., deg = 3){
    S <- map2(.x = list(X., X.^deg),
              .y = list(y., y.^deg),
              .f = ~ sweep(.x, 
                           MARGIN = 2, 
                           STATS = .y,
                           FUN = "-"))
    if(deg %% 2 == 0){
      Tem <- sweep(S[[1]], 2, y.^(deg-1), FUN = "*")
      res <- rowSums(S[[2]] - deg * Tem)
    }
    else{
      Tem <- sweep(S[[1]], 2, sign(y.) * y.^(deg-1), FUN = "*")
      res <- rowSums(S[[2]] - deg * Tem)
    }
    return(res)
}
lookup_div <- list(
  euclidean = euclidDiv,
  gkl = gklDiv,
  logistic = logDiv,
  itakura = itaDiv,
  exponential = expDiv,
  polynomial = polyDiv
)

2.2 Function : BregmanDiv

This function computes the matrix of BDs between two sets of data points. Each set of data points should be stored as matrices, data frame, or tibble object where each row represents an individual data point.

  • Argument:

    • X., C. : The data matrices, tibbles and data frames, containing the data points for which the Bregman divergences between them are to be computed.
    • div : the type of divergence to be used. It should be a subset of {"euclidean", "gkl", "logistic", "itakura", "exponential", "polynomial"}.
    • deg : the degree of polynomial BD (if one is used).
  • Value:

    This function returns a tibble object \(D=(d_{i,j})\) where \(d_{i,j}\) is the Bregman divergence between row \(i\) of X. and row \(j\) of C..

BregmanDiv <- function(X., 
                       C., 
                       div = c("euclidean",
                                "gkl",
                                "logistic",
                                "itakura",
                                "exponential",
                                "polynomial"),
                       deg = 3){
  div <- match.arg(div)
  d_c <- dim(C.)
  if(is.null(d_c)){
    C <- matrix(C., nrow = 1, byrow = TRUE)
  } else{
    C <- as.matrix(C.)
  }
  if(is.null(dim(X.))){
    X <- matrix(X., nrow = 1, byrow = TRUE)
  } else{
    X <- as.matrix(X.)
  }
  dis <-  map_dfc(.x = 1:dim(C)[1],
                  .f = ~ tibble('{{.x}}' := lookup_div[[div]](X, C[.x,], deg = deg)))
  return(dis)
}

3 Step \(K\): \(K\)-means with Bregman divergences

This section implements \(K\)-means algorithm using Bregman divergences which corresponds to the step \(K\) of KFC procedure.

3.1 Function : findClosestCentroid and newCentroids

These two functions perform the main steps of \(K\)-means algorithm. findClosestCentroid function assigns any data points to some cluster according to the nearest centroid among all the centroids, and newCentroids function compute the new centroids of the partition given the cluster labels of all data points.

  • Argument:

    • x. : The data matrices, tibbles and data frames, containing the data points to be assigned to some cluster.
    • centroids : The matrix or data frame containing the centroids (by row).
    • div : the type of divergence to be used.
    • deg : the degree of polynomial BD (if one is used).
  • Value:

    The each of the two functions returns arguments for one another:

    • findClosestCentroid returns a vector of size equals to the number of rows of data matrix x., containing the cluster labels of the data points.
    • newCentroids returns the matrix of centroids.
findClosestCentroid <- function(x., centroids., div, deg = 3){
  dist <- BregmanDiv(x., centroids., div, deg)
  clust <- 1:nrow(x.) %>%
    map_int(.f = ~ which.min(dist[.x,]))
  return(clust)
}
newCentroids <- function(x., clusters.){
  centroids <- unique(clusters.) %>%
    map_dfr(.f = ~ colMeans(x.[clusters. == .x, ]))
  return(centroids)
}

3.2 Function : kmeansBD

This function performs \(K\)-means algorithm with BDs.

  • Argument:

    • train_input : The data matrices, tibbles and data frames, containing the data points.
    • K : The number of clusters.
    • n_start : the number of times to perform the algorithm, and the best one among them is chosen to be the final result. This is done to avoid local optimal solutions. By default, n_start = 5.
    • maxIter : the maximum number of iterations in case the algorithm does not converge. By default, maxIter = 500.
    • deg : the degree of polynomial BD (if one is used).
    • scale_input : logical value controlling whether to scale the input to be in \((0,1)\) or not. By default, scale_input = FALSE.
    • div : the type of divergence to be used. By default, div = "euclidean" and the usual \(K\)-means algorithm is performed.
    • splits : real number between \(0\) and \(1\) specifying the proportion of training data to be used to perform \(K\)-means algorithm. The remaining part with be used for the aggregation. By default, splits = 1 and all the input data are used.
    • epsilon : the stopping threshold criterion to stop the algorithm. By default, epsilon = 1e-10.
    • center_, scale_ : the center and scale to be used to scale the input data. By default, they are NULL.
    • id_shuffle : a logical vector specifying which part of the training data will be selected to perform the algorithm. This is important when we want to perform the algorithm on the same set of data points but with different BDs.
  • Value:

    This function returns a list of the following objects:

    • centroids : the matrix of the obtained centroids.
    • clusters : a vector of cluster label of the data points.
    • train_data : list of the following important objects
      • X_train : the training data used for the algorithm.
      • X_remain : the remaining part of the input data used for the aggregation.
      • id_remain : the logical vector specifying the remaining part (X_remain) of the input data.
    • parameters : the list of the following objects:
      • div : divergence used.
      • deg : the degree of polynomial BD (if one is used).
      • center_, scale_ : the center and scale used to scale the input data.
    • running_time: the computational time of the algorithm.
kmeansBD <- function(train_input,
                     K,
                     n_start = 5,
                     maxIter = 500,
                     deg = 3,
                     scale_input = FALSE,
                     div = "euclidean",
                     splits = 1,
                     epsilon = 1e-10,
                     center_ = NULL,
                     scale_ = NULL,
                     id_shuffle = NULL){
  start_time <- Sys.time()
  # Distortion function
  X <- as.matrix(train_input)
  N <- dim(X)
  if(scale_input){
    if(!(is.null(center_) & is.null(scale_))){
      if(length(center_) == 1){
        center_ <- rep(center_, N[2])
      }
      if(length(scale_) == 1){
        scale_ <- rep(scale_, N[2])
      }
    } else{
      min_ <- apply(X, 2, FUN = min)
      c_ <- abs(colMeans(X)/5)
      center_ <- min_ - c_
      scale_ <- apply(X, 2, FUN = max) - center_ + 1
    }
    X <- scale(X, center = center_, scale = scale_)
  }
  if(is.null(id_shuffle)){
    train_id <- rep(TRUE, N[1])
    if(splits < 1){
      train_id[sample(N[1], floor(N[1]*(1-splits)))] <- FALSE
    }
  } else{
    train_id <- id_shuffle
  }
  X_train1 <- X[train_id,]
  X_train2 <- X[!train_id,]
  mu <- as.matrix(colMeans(X_train1))
  distortion <- function(clus){
    cent <- newCentroids(X_train1, clus)
    var_within <- 1:K %>%
      map(.f = ~ BregmanDiv(X_train1[clus == .x,], 
                            cent[.x,], 
                            div, 
                            deg)) %>%
      map(.f = sum) %>%
      Reduce("+", .)
    return(var_within)
  }
  # Kmeans algorithm
  kmeansWithBD <- function(x., k., maxiter., eps.) {
    n. <- nrow(x.)
    # initialization
    init <- sample(n., k.)
    centroids_old <- x.[init,]
    i <- 0
    while(i < maxIter){
      # Assignment step
      clusters <- findClosestCentroid(x., centroids_old, div, deg)
      # Recompute centroids
      centroids_new <- newCentroids(x., clusters)
      if ((sum(is.na(centroids_new)) > 0) |
          (nrow(centroids_new) != k.)) {
        init <- sample(n., k.)
        centroids_old <- x.[init,]
        warning("NA produced -> reinitialize centroids...!")
      }
      else{
        if(sum(abs(centroids_new - centroids_old)) > eps.){
          centroids_old <- centroids_new
        } else{
          break
        }
      }
      i <- i + 1
    }
    return(clusters)
  }
  results <- 1:n_start %>% 
    map_dfc(.f = ~ tibble("{{.x}}" := kmeansWithBD(X_train1, 
                                                   K,
                                                   maxIter, 
                                                   epsilon)))
  opt_id <- 1:n_start %>%
    map_dbl(.f = ~ distortion(results[[.x]])) %>%
    which.min
  cluster <- clusters <- results[[opt_id]]
  j <- 1
  ID <- unique(cluster)
  for (i in ID) {
    clusters[cluster == i] = j
    j =  j + 1
  }
  centroids = newCentroids(X_train1, clusters)
  time_taken <- Sys.time() - start_time
  return(
    list(
      centroids = centroids,
      clusters = clusters,
      train_data = list(X_train = X_train1,
                        X_remain = X_train2,
                        id_remain = !train_id),
      parameters = list(div = div,
                        deg = deg,
                        center_ = center_,
                        scale_ = scale_),
      running_time = time_taken
    )
  )
}

Example.1: We perform \(K\)-means algorithm with "gkl" BD on Abalone dataset.


pacman::p_load(readr)
colname <- c("Type", "LongestShell", "Diameter", "Height", "WholeWeight", "ShuckedWeight", "VisceraWeight", "ShellWeight", "Rings")
df <- readr::read_delim("https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data", col_names = colname, delim = ",", show_col_types = FALSE)
n <- nrow(df)
train <- logical(n)
train[sample(n,  floor(n*0.8))] <- TRUE
cl <- df[train,2:(ncol(df)-1)] %>%
  kmeansBD(K = 3, div = "gkl", splits = 0.5, scale_input = TRUE)
table(cl$clusters)

  1   2   3 
435 697 539 

4 Step \(F\): Fitting predictive models

This section builds global models by fitting local model on each given cluster of the obtained partition. This corresponds to the step \(F\) of the procedure.

4.1 Function : fitLocalModels

This function fits local models on all clusters of the obtained partition.

  • Argument:

    • kmeans_BD : An object obtained from kmeansBD function.
    • train_response : The vector of response variable corresponding to the full input_data given to kmeanBD function.
    • model : Type of local model to fit on all the clusters of the given partition. It should be either a model object which is compactible
    • formula : The degree of polynomial BD (if one is used).
  • Value:

    This function returns a list of the following objects:

    • local_models : all the local models fitted on all clusters of the given partition.
    • kmeans_BD : the kmeansBD object.
    • data_remain : a list of the following object:
      • fit : the fitted values of the remaining part of the input data.
      • response : the actual response values corresponding to the remaining input data.
    • running_time : the computational time of the algorithm.
fitLocalModels <- function(kmeans_BD,
                           train_response,
                           model = "lm",
                           formula = NULL){
  start_time <- Sys.time()
  X_train <- kmeans_BD$train_data$X_train
  y_train <- train_response[!(kmeans_BD$train_data$id_remain)]
  X_remain <- kmeans_BD$train_data$X_remain
  y_remain <- NULL
  if(!is.null(X_remain)){
    y_remain <- train_response[kmeans_BD$train_data$id_remain]
  }
  pacman::p_load(tree)
  pacman::p_load(randomForest)
  model_ <- ifelse(model == "tree", tree::tree, model)
  K <- nrow(kmeans_BD$centroids)
  if (is.null(formula)){
    form <- formula(target ~ .)
  }
  else{
    form <- update(formula, target ~ .)
  }
  data_ <- bind_cols(X_train, "target":= y_train)
  fit_lookup <- list(lm = "fitted.values",
                     rf = "predicted")
  if(is.character(model_)){
    model_lookup <- list(lm = lm,
                         rf = randomForest::randomForest)
    mod <- map(.x = 1:K, 
               .f = ~ model_lookup[[model_]](formula = form, 
                                             data = data_[kmeans_BD$clusters == .x, ]))
  } else{
    mod <- map(.x = 1:K, 
               .f = ~ model_(formula = form, 
                             data = data_[kmeans_BD$clusters == .x,]))
  }
  pred0 <- NULL
  if(!is.null(X_remain)){
    pred0 <- vector(mode = "numeric", 
                    length = length(y_remain))
    clus <- findClosestCentroid(x. = X_remain,
                                centroids. = kmeans_BD$centroids,
                                div = kmeans_BD$parameters$div,
                                deg = kmeans_BD$parameters$deg)
    for(i_ in 1:K){
      pred0[clus == i_] <- predict(mod[[i_]],
                                   as.data.frame(X_remain[clus == i_,]))
    }
  }
  time_taken <- Sys.time() - start_time
  return(list(
    local_models = mod,
    kmeans_BD = kmeans_BD,
    data_remain = list(fit = pred0,
                       response = y_remain),
    running_time = time_taken
  ))
}

Example.2: From Example.1 above, multiple linear regression models are built on all the obtained clusters. The mean square error of this model, evaluated on the remaining \(50\%\) of the training data is computed.


fit <- fitLocalModels(train_response = df$Rings[train],
                      kmeans_BD = cl,
                      model = "lm")

mean((fit$data_remain$response- fit$data_remain$fit)^2)
[1] 4.71925

4.2 Function : localPredict

This function allows us to predict any new observation using the candidate model \(\cal M_j=({\cal M}_{j,k})_{k=1}^M\) corresponding to Bregman divergence \({\cal B}_j\), for some \(j\in J\subset\{1,...,M\}\).

  • Argument:

    • localModels : The local model object obtained from fitLocalModels function.
    • newData : New input data to be predicted using the candidate models given in localModels argument.
  • Value:

    This function returns a predicted vector of the newData.

localPredict <- function(localModels,
                         newData){
  kmean_BD <- localModels$kmeans_BD
  K <- nrow(kmean_BD$centroids)
  newData_ <- newData
  if(!(is.null(kmean_BD$parameters$center_))){
    newData_ <- scale(newData,
                      center = kmean_BD$parameters$center_,
                      scale = kmean_BD$parameters$scale_)
    id0 <- newData_ <= 0
    if(any(id0)){
      min_ <- min(newData_[id0])
      newData_[id0] <- runif(sum(id0), min(1e-3, min_/10), min_)
    }
  }
  clus <- findClosestCentroid(x. = newData_,
                              centroids. = kmean_BD$centroids,
                              div = kmean_BD$parameters$div,
                              deg = kmean_BD$parameters$deg)
  pred0 <- vector(mode = "numeric", length = nrow(newData_))
  for(i_ in 1:K){
    pred0[clus == i_] <- predict(localModels$local_models[[i_]],
                                 as.data.frame(newData_[clus == i_,]))
  }
  pred0 <- as_tibble(pred0)
  names(pred0) <- ifelse(kmean_BD$parameters$div == "polynomial",
                         paste0("polynomial", kmean_BD$parameters$deg),
                         kmean_BD$parameters$div)
  return(pred0)
}

Example.3: The the performance of the candidate model corresponding to "gkl" divergence is compared to random forest regression on a \(20\%\) testing data.


y_hat <- localPredict(fit,
                      df[!train, 2:(ncol(df)-1),])
rf <- randomForest(Rings ~ ., data = df[train,2:ncol(df)])
mean((predict(rf, newdata = df[!train,2:ncol(df)])- df$Rings[!train])^2)
[1] 4.290698
mean((y_hat$gkl-df$Rings[!train])^2)
[1] 4.072496

5 Step \(C\): Combining methods

The aggregation methods are available here . The aggregation methods are imported into Rstudio environment.

pacman::p_load(devtools)
### Kernel based consensual aggregation
source_url("https://raw.githubusercontent.com/hassothea/AggregationMethods/main/KernelAggReg.R")
ℹ SHA-1 hash of file is a8312879ba2ed0c5335566dd75fb90751025953c
### MixCobra
source_url("https://raw.githubusercontent.com/hassothea/AggregationMethods/main/MixCobraReg.R")
ℹ SHA-1 hash of file is 63f73453cd5d3e7006bd02ee84e8115300a20178

5.1 Function : stepK, stepF and stepC

These functions allow to set the values of the parameters in the three steps of the KFC procedure. Each function returns a list of all the parameters given in its arguments.

stepK = function(K,
                 n_start = 5,
                 maxIter = 300,
                 deg = 3,
                 scale_input = FALSE,
                 div = NULL,
                 splits = 0.75,
                 epsilon = 1e-10,
                 center_ = NULL,
                 scale_ = NULL){
  return(list(K = K,
              n_start = n_start,
              maxIter = maxIter,
              deg = deg,
              scale_input = scale_input,
              div = div,
              splits = splits,
              epsilon = epsilon,
              center_ = center_ ,
              scale_ = scale_))
}

stepF = function(formula = NULL, 
                 model = "lm"){
  return(list(formula = formula, 
              model = model))
}

stepC = function(n_cv = 5,
                 method = c("cobra", "mixcobra"),
                 opt_methods = c("grad", "grid"),
                 kernels = "gaussian",
                 scale_features = FALSE){
  return(list(n_cv = n_cv,
              method = method,
              opt_methods = opt_methods,
              kernels = kernels,
              scale_features = scale_features))
}

6 Function: KFCreg

This function is the complete implementation of KFC procedure.


Remark.2: The parallel argument above requires internet connection to load the source codes of \(K\)-means algorithm with BDs from GitHub . It is performed on the maximum number of clusters of your machine, and the speed is at least two times faster than without parallelism, however, it is not so stable depending on your internet connection or machine. About the aggregation methods of step \(C\),


KFCreg = function(train_input,
                  train_response,
                  test_input,
                  test_response = NULL,
                  n_cv = 5,
                  parallel = TRUE,
                  inv_sigma = sqrt(.5),
                  alp = 2,
                  K_step = stepK(splits = .5),
                  F_step = stepF(),
                  C_step = stepC(),
                  setGradParamAgg = setGradParameter(),
                  setGridParamAgg = setGridParameter(),
                  setGradParamMix = setGradParameter_Mix(),
                  setGridParamMix = setGridParameter_Mix()){
  start_time <- Sys.time()
  lookup_div_names <- c("euclidean",
                         "gkl",
                         "logistic",
                         "itakura",
                         "polynomial")
  div_ <- K_step$div
  ### K step: Kmeans clustering with BDs
  if (is.null(K_step$div)){
    divergences <- lookup_div_names
    warning("No divergence provided! All of them are used!")
  }
  else{
    divergences <- K_step$div %>% 
      map_chr(.f = ~ match.arg(arg = .x, 
                               choices = lookup_div_names))
  }
  div_list <- divergences %>% 
    map(.f = (\(x) if(x != "polynomial") return(x) else return(rep("polynomial", length(K_step$deg))))) %>%
    unlist
  deg_list <- rep(NA, length(div_))
  deg_list[div_list == "polynomial"] <- K_step$deg
  div_names <- map2_chr(.x = div_list,
                        .y = deg_list,
                        .f = (\(x, y) if(is.na(y)) return(x) else return(paste0(x,y))))
  ### Step K: Kmeans clustering with Bregman divergences
  dm <- dim(train_input)
  id_shuffle <- vector(length = dm[1])
  n_train <- floor(K_step$splits * dm[1])
  id_shuffle[sample(dm[1], n_train)] <- TRUE
  if(parallel){
    numCores <- parallel::detectCores()
    doParallel::registerDoParallel(numCores) # use multicore, set to the number of our cores
    kmean_ <- foreach(i=1:length(div_names)) %dopar% {
      devtools::source_url("https://raw.githubusercontent.com/hassothea/KFC-Procedure/master/kmeanBD.R")
      kmeansBD(train_input = train_input,
               K = K_step$K,
               div = div_list[i],
               n_start = K_step$n_start,
               maxIter = K_step$maxIter,
               deg = deg_list[i],
               scale_input = K_step$scale_input,
               splits = K_step$splits,
               epsilon = K_step$epsilon,
               center_ = K_step$center_,
               scale_ = K_step$scale_,
               id_shuffle = id_shuffle)
    }
    doParallel::stopImplicitCluster()
  } else{
    kmean_ <- map2(.x = div_list,
                   .y = deg_list,
                   .f = ~ kmeansBD(train_input = train_input,
                                   K = K_step$K,
                                   div = .x,
                                   n_start = K_step$n_start,
                                   maxIter = K_step$maxIter,
                                   deg = .y,
                                   scale_input = K_step$scale_input,
                                   splits = K_step$splits,
                                   epsilon = K_step$epsilon,
                                   center_ = K_step$center_,
                                   scale_ = K_step$scale_,
                                   id_shuffle = id_shuffle))
  }
  names(kmean_) <- div_names
  ### F step: Fitting the corresponding model on each observed cluster
  model_ <- div_names %>%
    map(.f = ~ fitLocalModels(kmean_[[.x]],
                              train_response = train_response,
                              model = F_step$model,
                              formula = F_step$formula))
  names(model_) <- div_names
  pred_combine <- model_ %>%
    map_dfc(.f = ~ .x$data_remain$fit)
  y_remain <- train_response[!id_shuffle]
  pred_test <- div_names %>%
    map_dfc(.f = ~ localPredict(model_[[.x]],
                                test_input))
  names(pred_test) <- names(pred_combine) <- div_names
  # C step: Consensual regression aggregation method with kernel-based COBRA
  list_method_agg <- list(mixcobra = function(pred){MixCobraReg(train_input = train_input[!id_shuffle,],
                                                                train_response = y_remain,
                                                                test_input = test_input,
                                                                train_predictions = pred,
                                                                test_predictions = pred_test,
                                                                test_response = test_response,
                                                                scale_input = K_step$scale_input,
                                                                scale_machine = C_step$scale_features,
                                                                n_cv = C_step$n_cv,
                                                                inv_sigma = inv_sigma,
                                                                alp = alp,
                                                                kernels = C_step$kernels,
                                                                optimizeMethod = C_step$opt_methods,
                                                                setGradParam = setGradParamMix,
                                                                setGridParam = setGridParamMix)},
                          cobra = function(pred){kernelAggReg(train_design = pred,
                                                              train_response = y_remain,
                                                              test_design = pred_test,
                                                              test_response = test_response,
                                                              scale_input = K_step$scale_input,
                                                              scale_machine = C_step$scale_features,
                                                              build_machine = FALSE,
                                                              machines = NULL,
                                                              n_cv = C_step$n_cv,
                                                              inv_sigma = sqrt(.5),
                                                              alp = 2,
                                                              kernels = C_step$kernels,
                                                              optimizeMethod = C_step$opt_methods,
                                                              setGradParam = setGradParamAgg,
                                                              setGridParam = setGridParamAgg)})
  res <- map(.x = C_step$method,
             .f = ~ list_method_agg[[.x]](pred_combine))
  list_agg_methods <- list(cobra = "cob",
                           mixcobra = "mix")
  names(res) <- C_step$method
  ext_fun <- function(L, nam){
    tab <- L$fitted_aggregate
    names(tab) <- paste0(names(tab), "_", nam)
    return(tab)
  }
  pred_fin <- C_step$method %>%
    map_dfc(.f = ~ ext_fun(res[[.x]], list_agg_methods[[.x]]))
  time.taken <- Sys.time() - start_time
  ### To finish
  if(is.null(test_response)){
    return(list(
    predict_final = pred_fin,
    predict_local = pred_test,
    agg_method = res,
    running_time = time.taken
  ))
  } else{
    error <- cbind(pred_test, pred_fin) %>%
      dplyr::mutate(y_test = test_response) %>%
      dplyr::summarise_all(.funs = ~ (. - y_test)) %>%
      dplyr::select(-y_test) %>%
      dplyr::summarise_all(.funs = ~ mean(.^2))
    return(list(
      predict_final = pred_fin,
      predict_local = pred_test,
      agg_method = res,
      mse = error,
      running_time = time.taken
  ))
  }
}

Example.4: A complete KFC procedure is implemented on the same Abalone data, using \(6\) BDs "euclidean", "itakura", "gkl", "logistic" and "polynomial" (of degree \(3\) and \(6\)). Both aggregation methods are used in the step \(C\). Two kernel functions are used for each aggregation method: "gaussian" (with gradient descent algorithm) and "epanechnikov" (with grid search algorithm).

train1 <- logical(n)
train1[sample(n,  floor(n*0.8))] <- TRUE
kfc1 <- KFCreg(train_input = df[train1,2:ncol(df)],
                train_response = df$Rings[train1],
                test_input = df[!train1,2:ncol(df)],
                K_step = stepK(K = 3,
                               scale_input = TRUE,
                               div = c("eucl", "ita", "gkl", "log" ,"poly"),
                               deg = c(3, 6),
                               splits = .5),
                C_step = stepC(method = c("cobra", "mixcobra"),
                               opt_methods = c("grad", "grid"),
                               kernels = c("gaussian", "gaussian"),
                               scale_features = FALSE),
                setGradParamAgg = setGradParameter(rate = 1),
                setGridParamAgg = setGridParameter(min_val = .00001,
                                                   max_val = 10,
                                                   n_val = 100),
                setGradParamMix = setGradParameter_Mix(rate = c(1,1)),
                setGridParamMix = setGridParameter_Mix(min_alpha = 1e-10,
                                                       max_alpha = 5,
                                                       min_beta = 1e-10,
                                                       max_beta = 10,
                                                       n_alpha = 20,
                                                       n_beta = 20))


Kernel-based consensual aggregation method
------------------------------------------

* Gradient descent algorithm ...
  Step  |  Parameter    |  Gradient |  Threshold 
 ---------------------------------------------------
   0    |  23.3334  |  -2.75e-11    |  1e-10 
 ---------------------------------------------------
   1    |  24.3334  |  -1.742e-10   |  0.0428571 

   2    |  30.6667  |  1.467e-10    |  0.2602737 

   3    |  25.3334  |  -1.1e-10     |  0.1739129 

   4    |  29.3334  |  2.567e-10    |  0.1578946 

   5    |  20.0000  |  -2.75e-11    |  0.3181816 

   6    |  21.0000  |  2.108e-10    |  0.04999994 

   7    |  13.3334  |  1.1e-10  |  0.365079 

   8    |  11.3334  |  0e+00    |  0.1499998 
-------------------------------------------------------
 Stopped|  11.3334  |  0        |  0
 ~ Observed parameter: 11.33336  in 8 iterations.
* Grid search algorithm... 
 ~ Observed parameter : 7.47475

MixCobra for regression
-----------------------

* Gradient descent algorithm ...
  Step  |  alpha    ;  beta     |  Gradient (alpha ; beta)  |  Threshold 
 --------------------------------------------------------------------------------
   0    |  7.50002  ;  37.52500     |  -2.292e-10  ;  -1.1e-10      |  1e-10 
 --------------------------------------------------------------------------------
   1    |  8.5000  ;  38.5250   |  -1.467e-10  ;  0e+00     |  0.04441974 

   2    |  9.1400  ;  38.5250   |  -1.375e-10  ;  -2.75e-11     |  0.01360977 

   3    |  9.7400  ;  38.7750   |  2.75e-11  ;  -3.667e-11  |  0.01783278 

   4    |  9.6200  ;  39.1083   |  -2.75e-11  ;  -1.833e-10     |  0.009344184 

   5    |  9.7400  ;  40.7750   |  -9.167e-11  ;  -8.25e-11     |  0.03666585 

   6    |  10.1400  ;  41.5250  |  -1.1e-10  ;  1.1e-10     |  0.0227655 

   7    |  10.6200  ;  40.5250  |  -2.75e-11  ;  2.75e-11   |  0.02864607 

   8    |  10.7400  ;  40.4000  |  3.667e-11  ;  -2.292e-10     |  0.0047903 

   9    |  10.5800  ;  41.4417  |  0e+00  ;  1.375e-10  |  0.02349758 

   10   |  10.5800  ;  41.1292  |  -1.375e-10  ;  -3.759e-10    |  0.00600711 

   11   |  10.7300  ;  41.5562  |  1.1e-10  ;  2.017e-10    |  0.01116017 

   12   |  10.6700  ;  41.4417  |  -1.1e-10  ;  0e+00   |  0.00333899 

   13   |  10.7000  ;  41.4417  |  -1.192e-10  ;  -2.75e-11     |  0.0005756866 

   14   |  10.7163  ;  41.4456  |  -1.1e-10  ;  0e+00   |  0.0003865669 

   15   |  10.7313  ;  41.4456  |  -2.75e-11  ;  3.667e-11  |  0.0002875665 

   16   |  10.7350  ;  41.4443  |  -1.1e-10  ;  -1.1e-10    |  9.682615e-05 

   17   |  10.7500  ;  41.4462  |  1.192e-10  ;  2.108e-10  |  0.0003249014 

   18   |  10.7338  ;  41.4444  |  0e+00  ;  -9.167e-11     |  0.0003471848 

   19   |  10.7338  ;  41.4448  |  -1.558e-10  ;  -1.1e-10  |  7.798307e-06 

   20   |  10.7391  ;  41.4450  |  -3.667e-11  ;  5.5e-11   |  0.0001064928 

   21   |  10.7397  ;  41.4449  |  9.167e-11  ;  2.75e-11   |  1.431605e-05 

   22   |  10.7382  ;  41.4449  |  0e+00  ;  -1.833e-10     |  3.052659e-05 

   23   |  10.7382  ;  41.4451  |  -1.1e-10  ;  -3.667e-11  |  3.898789e-06 

   24   |  10.7386  ;  41.4451  |  -1.375e-10  ;  9.167e-11     |  9.372653e-06 

   25   |  10.7389  ;  41.4450  |  -1.192e-10  ;  -3.667e-11    |  6.588866e-06 

   26   |  10.7392  ;  41.4450  |  -1.192e-10  ;  3.667e-11     |  5.060538e-06 

   27   |  10.7394  ;  41.4450  |  1.192e-10  ;  2.75e-11   |  4.963045e-06 

   28   |  10.7392  ;  41.4450  |  -3.667e-11  ;  6.417e-11     |  4.902104e-06 

   29   |  10.7392  ;  41.4450  |  2.75e-11  ;  7.334e-11   |  8.338345e-07 

   30   |  10.7392  ;  41.4450  |  2.75e-11  ;  1.192e-10   |  3.781737e-07 

   31   |  10.7392  ;  41.4450  |  -1.1e-10  ;  -9.167e-11  |  2.987378e-07 

   32   |  10.7392  ;  41.4450  |  -1.467e-10  ;  -1.467e-10    |  6.832472e-07 

   33   |  10.7392  ;  41.4450  |  -1.1e-10  ;  3.667e-11   |  4.717424e-07 

   34   |  10.7392  ;  41.4450  |  -9.167e-11  ;  2.75e-11  |  3.050729e-07 

   35   |  10.7393  ;  41.4450  |  -1.558e-10  ;  -2.108e-10    |  2.430593e-07 

   36   |  10.7393  ;  41.4450  |  3.667e-11  ;  0e+00  |  4.677214e-07 

   37   |  10.7393  ;  41.4450  |  1.375e-10  ;  1.467e-10  |  9.356861e-08 

   38   |  10.7393  ;  41.4450  |  -1.192e-10  ;  -3.667e-11    |  1.876246e-07 

   39   |  10.7393  ;  41.4450  |  0e+00  ;  0e+00  |  1.53572e-07 

   40   |  10.7393  ;  41.4450  |  0e+00  ;  0e+00  |  1.558412e-10 
--------------------------------------------------------------------------------
 Stopped|  10.7393  ;  41.4450  |     0         |  0
 ~ Observed parameter: (alpha, beta) = ( 10.73927 ,  41.44502 ) in 40 itertaions.

* Grid search algorithm...
 ~ Observed parameter: (alpha, beta) = (1e-10, 7.368421)

The mean square errors evaluated on \(20\%\)-testing data of the above computation are reported below.

kfc1$predict_final %>%
  mutate(rf = predict(rf1, newdata = df[!train1,2:ncol(df)])) %>%
  sweep(MARGIN = 1, STATS = df$Rings[!train1], FUN = "-") %>%
  .^2  %>%
  colMeans
gaussian_grad_cob gaussian_grid_cob gaussian_grad_mix gaussian_grid_mix                rf 
     6.453406e-28      2.144367e-20      1.803888e+00      5.635831e+00      4.603206e+00 

📖 Read also KernelAggReg and MixCobraReg.



---
title: "<span style='color: #1C81AA;'>**KFC procedure for regression -</span> [Has et al. (2021)](https://www.tandfonline.com/eprint/YKGS8GTKDBKYFXEGFWSB/full?target=10.1080/00949655.2021.1891539)**"
author: "<span style='color: #D4A51C;'>***Sothea Has***</span>"
date: "5/20/2022"
output:
  html_document:
    css: hideOutput.css
    includes:
      in_header: hideOutput.script
    df_print: paged
    code_folding: hide
    number_sections: yes
    toc: yes
    toc_depth: '2'
    tocdepth: 2
  html_notebook:
    css: hideOutput.css
    includes:
      in_header: hideOutput.script
    code_folding: hide
    number_sections: yes
    toc: yes
    toc_depth: 2
    tocdepth: 2
  pdf_document:
    toc: yes
    toc_depth: '2'
---

<style>
  .btn {
    border-width: 0 0px 0px 0px;
    font-weight: normal;
    text-transform: ;
  }
.btn-default {
  color: #2ecc71;
    background-color: #ffffff;
    border-color: #ffffff;
}
</style>

<!-- Colors
blue : #1FAAE3
yellow : #F0AE14
green : #54D319 
red : #E6180A
-->


```{r, echo=FALSE}
# Check if package "fontawesome" is already installed 

lookup_packages <- installed.packages()[,1]
if(!("fontawesome" %in% lookup_packages))
  install.packages("fontawesome")
```


<span style="color: #1FAAE3;">&#128270;<u> How to download & run the codes?</u></span>{-}
===

All the source codes of the aggregation methods are available [here <span style="color: #097BC1"> `r fontawesome::fa("github")`</span>](https://github.com/hassothea/AggregationMethods). To run the codes, you can <span style="color: #097BC1">`clone`</span> the repository directly or simply load the <span style="color: #097BC1">`R script`</span> source file from the repository using [devtools](https://cran.r-project.org/web/packages/devtools/index.html) package in <span style="color: #0287D8;"> **Rstudio** </span> as follow:

1. Install [devtools](https://cran.r-project.org/web/packages/devtools/index.html) package using command: 

    `install.packages("devtools")`

2. Loading the source codes from <span style="color: #097BC1">GitHub `r fontawesome::fa("github")`</span> repository using `source_url` function by: 

    `devtools::source_url("https://raw.githubusercontent.com/hassothea/AggregationMethods/main/KernelAggReg.R")`

---

> **&#9998; Note**: All codes contained in this `Rmarkdown` are built with recent version of <span style="color: #097BC1">`r fontawesome::fa("r-project")`</span> (version $>$ 4.1, available [here](https://cran.r-project.org/bin/windows/base/)) and <span style="color: #0287D8;"> **Rstudio** </span> (version > `2022.02.2+485`, available [here](https://www.rstudio.com/products/rstudio/download/#download)). Note also that the code chucks are <span style="color: #E6180A;">hidden</span> by default.

<span style="color: #F0AE14"> **To see the codes, you can:** </span>

- click on the top-right <span style="color: #54D319 ;">`Code`</span> button of the page, then choose **Show All Code** to show all the codes, or 
- simply click on the right-corner <span style="color: #54D319 ;">`Code`</span> button at each section to show the codes of that specific section.

---

<span style="color: #1FAAE3;"><u>KFC procedure & important packages </u></span>
===

<span style="color: #F0AE14;"><u>KFC procedure</u></span>
---

KFC procedure is a three-step methodology which puts together clustering and consensual aggregation methods to construct predictions in supervised learning problems. The three steps of the procedure are:

- **Step *K* **: $K$-means clustering algorithm is implemented on the input data using several options of Bregman divergences $({\cal B}_j)_{j=1}^M$ ($M$ is the number of total divergences used), therefore, the input data is partitioned into many different structures, according to the Bregman divergence used.
- **Step *F* **: For a partition structure given by a divergence ${\cal B}_j$, we fit simple models (linear, for example) on all the clusters of the obtained partition. Then, the collection ${\cal M}_j=\{{\cal M}_{j,k}\}_{k=1}^K$ of these local models is called *candidate* model, corresponding to the Bregman divergence ${\cal B}_j$. At the end of this step, several candidate models are constructed. 
- **Step *C* **: This step aggregates the obtained candidate models using consensual aggregation methods studied in [Has (2021)](https://hal.archives-ouvertes.fr/hal-02884333v5) or [Fischer and Mougeot (2019)](https://www.sciencedirect.com/science/article/pii/S0378375818302349).


---

>**Remark.1**: 
The prediction of any observation $x$ given by a candidate model ${\cal M}_j$ is done in two simple steps:

1. $x$ is classified into one of the obtained clusters using the corresponding divergence ${\cal B}_j$, i.e.,
  $$x\in{\cal C}_{k^*} \Leftrightarrow {\cal B}_j(c_{k^*},x)=\inf_{1\leq k\leq K}{\cal B}_j(c_k,x)$$
where $\{c_1,...,c_K\}_{k=1}^K$ are the centroids of the corresponding clusters $\{C_1,...,C_K\}$.

2. The prediction of $x$ is given by the corresponding local model ${\cal M}_{j,k^*}$ defined on cluster $k^*$, i.e., ${\cal M}_j(x)={\cal M}_{j,k^*}(x)$.

---

![The figure above represents the summary of KFC procedure](./kfc.png)

<span style="color: #F0AE14;"><u>Important packages</u></span>
---

We prepare all the necessary tools for this `Rmarkdown`. `pacman` package allows us to load (if exists) or install (if does not exist) any available packages from [The Comprehensive R Archive Network (CRAN)](https://cran.r-project.org/) of <span style="color: #097BC1">`r fontawesome::fa("r-project")`</span>. 


```{r}
# Check if package "pacman" is already installed 

lookup_packages <- installed.packages()[,1]
if(!("pacman" %in% lookup_packages))
  install.packages("pacman")


# To be installed or loaded
pacman::p_load(magrittr)
pacman::p_load(tidyverse)

## package for "generateMachines"
pacman::p_load(tree)
pacman::p_load(glmnet)
pacman::p_load(randomForest)
pacman::p_load(FNN)
pacman::p_load(xgboost)
pacman::p_load(keras)
pacman::p_load(pracma)
pacman::p_load(latex2exp)
pacman::p_load(plotly)
pacman::p_load(parallel)
pacman::p_load(foreach)
pacman::p_load(doParallel)
rm(lookup_packages)
```


<span style="color: #1FAAE3;"><u>Bregman divergences (BD)</u></span>
===

<span style="color: #1FAAE3;">**Definition**</span> Let $\phi:\mathcal{C}\rightarrow\mathbb{R}$ be a strictly convex and continuously differentiable function defined on a measurable convex subset $\mathcal{C}\subset\mathbb{R}^d$. Let $int(\mathcal{C})$ denote its relative interior. A Bregman divergence indexed by $\phi$ is a dissimilarity measure $d_{\phi}:\mathcal{C}\times int(\mathcal{C})\rightarrow\mathbb{R}$ defined for any pair $(x,y)\in \mathcal{C}\times int(\mathcal{C})$ by,
\begin{equation}
\label{eq:1.10}
d_{\phi}(x,y)=\phi(x)-\phi(y)-\langle x-y,\nabla\phi(y)\rangle 
\end{equation}
where $\nabla\phi(y)$ denotes the gradient of $\phi$ computed at a point $y\in int(\mathcal{C})$. A Bregman divergence is not necessarily a metric as it may not be symmetric and the triangular inequality might not be satisfied.

This section defines all the Bregman divergences used. The list of all the Bregman divergences is given in the table below:

----

Name                        $\phi$                                        $d_{\phi}$                                                                                                    $\cal C$
------------------------- ----------------------------------------------- ----------------------------------------------------------------------------------- -----------
Euclidean                 ${\|x\|_2^2}=\sum_{i=1}^dx_i^2$                   $\|x-y\|_2^2$                                                                     $\mathbb{R}^d$
General Kullback-Leibler  $\sum_{i=1}^d x_i\ln( x_i)$                       $\sum_{i=1}^d( x_i\ln(\frac{ x_i}{y_i})-(x_i-y_i))$                               $(0,+\infty)^d$                                  
Logistic                  $\sum_{i=1}^d(x_i\ln( x_i)+(1- x_i)\ln(1- x_i))$  $\sum_{i=1}^d\Big( x_i\ln(\frac{x_i}{y_i})+(1- x_i)\ln(\frac{1- x_i}{1-y_i})\Big)$   $(0,1)^d$
Itakura-Saito             $-\sum_{i=1}^d\ln( x_i)$                          $\sum_{i=1}^d\Big(\frac{ x_i}{y_i}-\ln(\frac{ x_i}{y_i})-1\Big)$                    $(0,+\infty)^d$
Exponential               $\sum_{i=1}^de^{x_i}$                             $\sum_{i=1}^d(e^{x_i}-e^{y_i}-e^{y_i}(x_i-y_i))$                                    $\mathbb{R}^d$
Polynomial                $\sum_{i=1}^d|x|^p,p>2$                           $\sum_{k=1}^d(|x_k|^p-|y_k|^p-\text{sign}(y_k)^pp(x_k-y_k)y_k^{p-1})$             $\mathbb{R}^d$

----

<span style="color: #F0AE14;"><u>Look-up list of Bregman divergences</u></span>
----

The codes below provide a look-up list of all the BDs defined in the table above.

```{r}
euclidDiv <- function(X., y., deg = NULL){
    res <- sweep(X., 2, y.)
    return(rowSums(res^2))
}
gklDiv <- function(X., y., deg = NULL){
  res <- c("/", "-") %>%
    map(.f = ~ sweep(X., 2, y., FUN = .x))
  return(rowSums(X.*log(res[[1]]) - res[[2]]))
}
logDiv <- function(X., y., deg = NULL){
    res <-  map2(.x = list(X., 1-X.),
                 .y = list(y., 1-y.),
                 .f = ~ sweep(.x, 2, .y, FUN = "/"))
    return(rowSums(X.*log(res[[1]])+(1-X.)*log(res[[2]])))
}
itaDiv <- function(X., y., deg = NULL){
    res <- sweep(X., 2, y., FUN = "/")
    return(rowSums(res-log(res) - 1))
}
expDiv <- function(X., y., deg = NULL){
    exp_y <- exp(y.)
    res <- sweep(1+X., 2, y.) %>%
      sweep(2, exp_y, FUN = "*")
    return(rowSums(exp(X.)-res))
}
polyDiv <- function(X., y., deg = 3){
    S <- map2(.x = list(X., X.^deg),
              .y = list(y., y.^deg),
              .f = ~ sweep(.x, 
                           MARGIN = 2, 
                           STATS = .y,
                           FUN = "-"))
    if(deg %% 2 == 0){
      Tem <- sweep(S[[1]], 2, y.^(deg-1), FUN = "*")
      res <- rowSums(S[[2]] - deg * Tem)
    }
    else{
      Tem <- sweep(S[[1]], 2, sign(y.) * y.^(deg-1), FUN = "*")
      res <- rowSums(S[[2]] - deg * Tem)
    }
    return(res)
}
lookup_div <- list(
  euclidean = euclidDiv,
  gkl = gklDiv,
  logistic = logDiv,
  itakura = itaDiv,
  exponential = expDiv,
  polynomial = polyDiv
)
```


<span style="color: #F0AE14;"><u>Function</u></span> : `BregmanDiv`
----

This function computes the matrix of BDs between two sets of data points. Each set of data points should be stored as matrices, data frame, or tibble object where each row represents an individual data point.

- **Argument**:

    - `X.`, `C.` : The data matrices, tibbles and data frames, containing the data points for which the Bregman divergences between them are to be computed.
    - `div` : the type of divergence to be used. It should be a subset of `{"euclidean", "gkl", "logistic", "itakura", "exponential", "polynomial"}`.
    - `deg` : the degree of polynomial BD (if one is used).
- **Value**: 
    
    This function returns a *tibble* object $D=(d_{i,j})$ where $d_{i,j}$ is the Bregman divergence between row $i$ of `X.` and row $j$ of `C.`.


```{r}
BregmanDiv <- function(X., 
                       C., 
                       div = c("euclidean",
                                "gkl",
                                "logistic",
                                "itakura",
                                "exponential",
                                "polynomial"),
                       deg = 3){
  div <- match.arg(div)
  d_c <- dim(C.)
  if(is.null(d_c)){
    C <- matrix(C., nrow = 1, byrow = TRUE)
  } else{
    C <- as.matrix(C.)
  }
  if(is.null(dim(X.))){
    X <- matrix(X., nrow = 1, byrow = TRUE)
  } else{
    X <- as.matrix(X.)
  }
  dis <-  map_dfc(.x = 1:dim(C)[1],
                  .f = ~ tibble('{{.x}}' := lookup_div[[div]](X, C[.x,], deg = deg)))
  return(dis)
}
```

<span style="color: #1FAAE3;"><u>Step $K$: $K$-means with Bregman divergences</u></span>
===

This section implements $K$-means algorithm using Bregman divergences which corresponds to the step $K$ of KFC procedure.

<span style="color: #F0AE14;"><u>Function</u></span> : `findClosestCentroid` and `newCentroids`
----

These two functions perform the main steps of $K$-means algorithm. `findClosestCentroid` function assigns any data points to some cluster according to the nearest centroid among all the centroids, and `newCentroids` function compute the new centroids of the partition given the cluster labels of all data points.

- **Argument**:

    - `x.` : The data matrices, tibbles and data frames, containing the data points to be assigned to some cluster.
    - `centroids` : The matrix or data frame containing the centroids (by row).
    - `div` : the type of divergence to be used.
    - `deg` : the degree of polynomial BD (if one is used).
    
- **Value**: 

    The each of the two functions returns arguments for one another:
    
    - `findClosestCentroid` returns a vector of size equals to the number of rows of data matrix `x.`, containing the cluster labels of the data points.
    - `newCentroids` returns the matrix of centroids.

```{r}
findClosestCentroid <- function(x., centroids., div, deg = 3){
  dist <- BregmanDiv(x., centroids., div, deg)
  clust <- 1:nrow(x.) %>%
    map_int(.f = ~ which.min(dist[.x,]))
  return(clust)
}
newCentroids <- function(x., clusters.){
  centroids <- unique(clusters.) %>%
    map_dfr(.f = ~ colMeans(x.[clusters. == .x, ]))
  return(centroids)
}
```

<span style="color: #F0AE14;"><u>Function</u></span> : `kmeansBD`
----

This function performs $K$-means algorithm with BDs. 

- **Argument**:

    - `train_input` : The data matrices, tibbles and data frames, containing the data points.
    - `K` : The number of clusters.
    - `n_start` : the number of times to perform the algorithm, and the best one among them is chosen to be the final result. This is done to avoid local optimal solutions. By default, `n_start = 5`.
    - `maxIter` : the maximum number of iterations in case the algorithm does not converge. By default, `maxIter = 500`.
    - `deg` : the degree of polynomial BD (if one is used).
    - `scale_input` : logical value controlling whether to scale the input to be in $(0,1)$ or not. By default, `scale_input = FALSE`.
    - `div` : the type of divergence to be used. By default, `div = "euclidean"` and the usual $K$-means algorithm is performed.
    - `splits` : real number between $0$ and $1$ specifying the proportion of training data to be used to perform $K$-means algorithm. The remaining part with be used for the aggregation. By default, `splits = 1` and all the input data are used.
    - `epsilon` : the stopping threshold criterion to stop the algorithm. By default, `epsilon = 1e-10`.
    - `center_`, `scale_` : the center and scale to be used to scale the input data. By default, they are `NULL`.
    - `id_shuffle` : a logical vector specifying which part of the training data will be selected to perform the algorithm. This is important when we want to perform the algorithm on the same set of data points but with different BDs. 
    
- **Value**: 

    This function returns a list of the following objects:
    - `centroids` : the matrix of the obtained centroids.
    - `clusters` : a vector of cluster label of the data points.
    - `train_data` : list of the following important objects
        - `X_train` : the training data used for the algorithm.
        - `X_remain` : the remaining part of the input data used for the aggregation.
        - `id_remain` : the logical vector specifying the remaining part (`X_remain`) of the input data.
    - `parameters` : the list of the following objects:
        - `div` : divergence used.
        - `deg` : the degree of polynomial BD (if one is used).
        - `center_`, `scale_` : the center and scale used to scale the input data.
    - `running_time`: the computational time of the algorithm.

```{r}
kmeansBD <- function(train_input,
                     K,
                     n_start = 5,
                     maxIter = 500,
                     deg = 3,
                     scale_input = FALSE,
                     div = "euclidean",
                     splits = 1,
                     epsilon = 1e-10,
                     center_ = NULL,
                     scale_ = NULL,
                     id_shuffle = NULL){
  start_time <- Sys.time()
  # Distortion function
  X <- as.matrix(train_input)
  N <- dim(X)
  if(scale_input){
    if(!(is.null(center_) & is.null(scale_))){
      if(length(center_) == 1){
        center_ <- rep(center_, N[2])
      }
      if(length(scale_) == 1){
        scale_ <- rep(scale_, N[2])
      }
    } else{
      min_ <- apply(X, 2, FUN = min)
      c_ <- abs(colMeans(X)/5)
      center_ <- min_ - c_
      scale_ <- apply(X, 2, FUN = max) - center_ + 1
    }
    X <- scale(X, center = center_, scale = scale_)
  }
  if(is.null(id_shuffle)){
    train_id <- rep(TRUE, N[1])
    if(splits < 1){
      train_id[sample(N[1], floor(N[1]*(1-splits)))] <- FALSE
    }
  } else{
    train_id <- id_shuffle
  }
  X_train1 <- X[train_id,]
  X_train2 <- X[!train_id,]
  mu <- as.matrix(colMeans(X_train1))
  distortion <- function(clus){
    cent <- newCentroids(X_train1, clus)
    var_within <- 1:K %>%
      map(.f = ~ BregmanDiv(X_train1[clus == .x,], 
                            cent[.x,], 
                            div, 
                            deg)) %>%
      map(.f = sum) %>%
      Reduce("+", .)
    return(var_within)
  }
  # Kmeans algorithm
  kmeansWithBD <- function(x., k., maxiter., eps.) {
    n. <- nrow(x.)
    # initialization
    init <- sample(n., k.)
    centroids_old <- x.[init,]
    i <- 0
    while(i < maxIter){
      # Assignment step
      clusters <- findClosestCentroid(x., centroids_old, div, deg)
      # Recompute centroids
      centroids_new <- newCentroids(x., clusters)
      if ((sum(is.na(centroids_new)) > 0) |
          (nrow(centroids_new) != k.)) {
        init <- sample(n., k.)
        centroids_old <- x.[init,]
        warning("NA produced -> reinitialize centroids...!")
      }
      else{
        if(sum(abs(centroids_new - centroids_old)) > eps.){
          centroids_old <- centroids_new
        } else{
          break
        }
      }
      i <- i + 1
    }
    return(clusters)
  }
  results <- 1:n_start %>% 
    map_dfc(.f = ~ tibble("{{.x}}" := kmeansWithBD(X_train1, 
                                                   K,
                                                   maxIter, 
                                                   epsilon)))
  opt_id <- 1:n_start %>%
    map_dbl(.f = ~ distortion(results[[.x]])) %>%
    which.min
  cluster <- clusters <- results[[opt_id]]
  j <- 1
  ID <- unique(cluster)
  for (i in ID) {
    clusters[cluster == i] = j
    j =  j + 1
  }
  centroids = newCentroids(X_train1, clusters)
  time_taken <- Sys.time() - start_time
  return(
    list(
      centroids = centroids,
      clusters = clusters,
      train_data = list(X_train = X_train1,
                        X_remain = X_train2,
                        id_remain = !train_id),
      parameters = list(div = div,
                        deg = deg,
                        center_ = center_,
                        scale_ = scale_),
      running_time = time_taken
    )
  )
}
```

----

> **Example.1**: We perform $K$-means algorithm with `"gkl"` BD on [Abalone](https://archive.ics.uci.edu/ml/machine-learning-databases/abalone) dataset.

----

```{r}
pacman::p_load(readr)
colname <- c("Type", "LongestShell", "Diameter", "Height", "WholeWeight", "ShuckedWeight", "VisceraWeight", "ShellWeight", "Rings")
df <- readr::read_delim("https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data", col_names = colname, delim = ",", show_col_types = FALSE)
n <- nrow(df)
train <- logical(n)
train[sample(n,  floor(n*0.8))] <- TRUE
cl <- df[train,2:(ncol(df)-1)] %>%
  kmeansBD(K = 3, div = "gkl", splits = 0.5, scale_input = TRUE)
table(cl$clusters)
```


<span style="color: #1FAAE3;"><u>Step $F$: Fitting predictive models</u></span>
===

This section builds global models by fitting local model on each given cluster of the obtained partition. This corresponds to the step $F$ of the procedure.

<span style="color: #F0AE14;"><u>Function</u></span> : `fitLocalModels`
----

This function fits local models on all clusters of the obtained partition.

- **Argument**:

    - `kmeans_BD` : An object obtained from `kmeansBD` function.
    - `train_response` : The vector of response variable corresponding to the full `input_data` given to `kmeanBD` function.
    - `model` : Type of local model to fit on all the clusters of the given partition. It should be either a model object which is compactible
    - `formula` : The degree of polynomial BD (if one is used).

- **Value**:

    This function returns a list of the following objects:
    - `local_models` : all the local models fitted on all clusters of the given partition.
    - `kmeans_BD` : the `kmeansBD` object.
    - `data_remain` : a list of the following object:
        - `fit` : the fitted values of the remaining part of the input data.
        - `response` : the actual response values corresponding to the remaining input data.
    - `running_time` : the computational time of the algorithm.

```{r}
fitLocalModels <- function(kmeans_BD,
                           train_response,
                           model = "lm",
                           formula = NULL){
  start_time <- Sys.time()
  X_train <- kmeans_BD$train_data$X_train
  y_train <- train_response[!(kmeans_BD$train_data$id_remain)]
  X_remain <- kmeans_BD$train_data$X_remain
  y_remain <- NULL
  if(!is.null(X_remain)){
    y_remain <- train_response[kmeans_BD$train_data$id_remain]
  }
  pacman::p_load(tree)
  pacman::p_load(randomForest)
  model_ <- ifelse(model == "tree", tree::tree, model)
  K <- nrow(kmeans_BD$centroids)
  if (is.null(formula)){
    form <- formula(target ~ .)
  }
  else{
    form <- update(formula, target ~ .)
  }
  data_ <- bind_cols(X_train, "target":= y_train)
  fit_lookup <- list(lm = "fitted.values",
                     rf = "predicted")
  if(is.character(model_)){
    model_lookup <- list(lm = lm,
                         rf = randomForest::randomForest)
    mod <- map(.x = 1:K, 
               .f = ~ model_lookup[[model_]](formula = form, 
                                             data = data_[kmeans_BD$clusters == .x, ]))
  } else{
    mod <- map(.x = 1:K, 
               .f = ~ model_(formula = form, 
                             data = data_[kmeans_BD$clusters == .x,]))
  }
  pred0 <- NULL
  if(!is.null(X_remain)){
    pred0 <- vector(mode = "numeric", 
                    length = length(y_remain))
    clus <- findClosestCentroid(x. = X_remain,
                                centroids. = kmeans_BD$centroids,
                                div = kmeans_BD$parameters$div,
                                deg = kmeans_BD$parameters$deg)
    for(i_ in 1:K){
      pred0[clus == i_] <- predict(mod[[i_]],
                                   as.data.frame(X_remain[clus == i_,]))
    }
  }
  time_taken <- Sys.time() - start_time
  return(list(
    local_models = mod,
    kmeans_BD = kmeans_BD,
    data_remain = list(fit = pred0,
                       response = y_remain),
    running_time = time_taken
  ))
}
```

---- 

> **Example.2**: From **Example.1** above, multiple linear regression models are built on all the obtained clusters. The mean square error of this model, evaluated on the remaining $50\%$ of the training data is computed.

----

```{r}
fit <- fitLocalModels(train_response = df$Rings[train],
                      kmeans_BD = cl,
                      model = "lm")

mean((fit$data_remain$response- fit$data_remain$fit)^2)
```

<span style="color: #F0AE14;"><u>Function</u></span> : `localPredict`
----

This function allows us to predict any new observation using the candidate model $\cal M_j=({\cal M}_{j,k})_{k=1}^M$ corresponding to Bregman divergence ${\cal B}_j$, for some $j\in J\subset\{1,...,M\}$. 

- **Argument**:

    - `localModels` : The local model object obtained from `fitLocalModels` function.
    - `newData` : New input data to be predicted using the candidate models given in `localModels` argument.
    
- **Value**:

    This function returns a predicted vector of the `newData`.

```{r}
localPredict <- function(localModels,
                         newData){
  kmean_BD <- localModels$kmeans_BD
  K <- nrow(kmean_BD$centroids)
  newData_ <- newData
  if(!(is.null(kmean_BD$parameters$center_))){
    newData_ <- scale(newData,
                      center = kmean_BD$parameters$center_,
                      scale = kmean_BD$parameters$scale_)
    id0 <- (newData_ <= 0)
    if(sum(id0) > 0){
      min_ <- min(newData_[id0])
      newData_[id0] <- runif(sum(id0), min(1e-3, min_/10), min_)
    }
  }
  clus <- findClosestCentroid(x. = newData_,
                              centroids. = kmean_BD$centroids,
                              div = kmean_BD$parameters$div,
                              deg = kmean_BD$parameters$deg)
  pred0 <- vector(mode = "numeric", length = nrow(newData_))
  for(i_ in 1:K){
    pred0[clus == i_] <- predict(localModels$local_models[[i_]],
                                 as.data.frame(newData_[clus == i_,]))
  }
  pred0 <- as_tibble(pred0)
  names(pred0) <- ifelse(kmean_BD$parameters$div == "polynomial",
                         paste0("polynomial", kmean_BD$parameters$deg),
                         kmean_BD$parameters$div)
  return(pred0)
}
```

----

> **Example.3**: The the performance of the candidate model corresponding to `"gkl"` divergence is compared to random forest regression on a $20\%$ testing data.

----

```{r}
y_hat <- localPredict(fit,
                      df[!train, 2:(ncol(df)-1),])
rf <- randomForest(Rings ~ ., data = df[train,2:ncol(df)])
mean((predict(rf, newdata = df[!train,2:ncol(df)])- df$Rings[!train])^2)
mean((y_hat$gkl-df$Rings[!train])^2)
```


<span style="color: #1FAAE3;"><u>Step $C$: Combining methods</u></span>
===

The aggregation methods are available [here <span style="color: #097BC1"> `r fontawesome::fa("github")`</span>](https://github.com/hassothea/AggregationMethods). The aggregation methods are imported into <span style="color: #0287D8;"> **Rstudio** </span> environment.

```{r, warning=FALSE}
pacman::p_load(devtools)
### Kernel based consensual aggregation
source_url("https://raw.githubusercontent.com/hassothea/AggregationMethods/main/KernelAggReg.R")
### MixCobra
source_url("https://raw.githubusercontent.com/hassothea/AggregationMethods/main/MixCobraReg.R")
```

<span style="color: #F0AE14;"><u>Function</u></span> : `stepK`, `stepF` and `stepC`
----

These functions allow to set the values of the parameters in the three steps of the KFC procedure. Each function returns a list of all the parameters given in its arguments.

```{r}
stepK = function(K,
                 n_start = 5,
                 maxIter = 300,
                 deg = 3,
                 scale_input = FALSE,
                 div = NULL,
                 splits = 0.75,
                 epsilon = 1e-10,
                 center_ = NULL,
                 scale_ = NULL){
  return(list(K = K,
              n_start = n_start,
              maxIter = maxIter,
              deg = deg,
              scale_input = scale_input,
              div = div,
              splits = splits,
              epsilon = epsilon,
              center_ = center_ ,
              scale_ = scale_))
}

stepF = function(formula = NULL, 
                 model = "lm"){
  return(list(formula = formula, 
              model = model))
}

stepC = function(n_cv = 5,
                 method = c("cobra", "mixcobra"),
                 opt_methods = c("grad", "grid"),
                 kernels = "gaussian",
                 scale_features = FALSE){
  return(list(n_cv = n_cv,
              method = method,
              opt_methods = opt_methods,
              kernels = kernels,
              scale_features = scale_features))
}
```


<span style="color: #1FAAE3;"><u>Function</u></span>: `KFCreg`
===

This function is the complete implementation of KFC procedure.

- **Argument**:

    - `train_input` : The matrix or data frame of training input data.
    - `train_response` : The training response variable.
    - `test_input`: The testing input data.
    - `test_response` : The response variable of the testing data. It is optional. If it is not `NULL`, the mean square error (mse) is computed.
    - `n_cv` : The number of folds in cross-validation.
    - `parallel` : A logical value specifying whether or not to perform the step $K$ in parallel. By default, `parallel = TRUE`, and note that internet connection is required in this case.
    - `inv_sigma`, `alp` : the inverse normalized constant $\sigma^{-1}>0$ and the exponent $\alpha>0$ of exponential kernel: $K(x)=e^{-\|x/\sigma\|^{\alpha}}$ for any $x\in\mathbb{R}^d$. By default, `inv_sigma = `$\sqrt{1/2}$ and `alpha = 2` which corresponds to the Gaussian kernel.
    - `K_step` : The object obtained from function `stepK` which allowing to set the values of the paramters in step $K$ of the procedure.
    - `F_step` : The object obtained from function `stepF` which allowing to set the values of the paramters in step $F$ of the procedure.
    - `C_step` : The object obtained from function `stepC` which allowing to set the values of the paramters in step $C$ of the procedure.
    - `setGradParamAgg` : The object from the `setGradParameter` function, allowing to set the values of the parameters of **gradient descent** algorithm for the <span style="color: #E6180A;">**1st aggregation method$^1$**</span>.
    - `setGridParamAgg` : The object from the `setGridParameter` function, allowing to set the values of the parameters of the **grid search** algorithm for the <span style="color: #E6180A;">**1st aggregation method$^1$**</span>.
    - `setGradParamMix` : The object from the `setGradParameter_Mix` function, allowing to set the values of the parameters of the **gradient descent** algorithm for the <span style="color: #0832CD">**2nd aggregation method$^2$**</span>.
    - `setGridParamMix` : The object from the `setGridParameter_Mix` function, allowing to set the values of the parameters of the **grid search** algorithm for the <span style="color: #0832CD">**2nd aggregation method$^2$**</span>.
    
- **Value**:

    This function returns a list of the following objects:
    
    - `predict_final` : the final predictions given by the aggregation of all the candidate models.
    - `predict_local` : the predictions given by all the individual candidate models.
    - `agg_method` : the list of aggregation methods obtained in the step $C$ of the procedure.
    - `running_time` : the computational time of the algorithm.

----

  > **Remark.2**: The `parallel` argument above requires internet connection to load the source codes of $K$-means algorithm with BDs from [GitHub <span style="color: #097BC1"> `r fontawesome::fa("github")`</span>](https://github.com/hassothea/KFC-Procedure/blob/master/kmeanBD.R). It is performed on the maximum number of clusters of your machine, and the speed is at least two times faster than without parallelism, however, it is not so stable depending on your internet connection or machine. About the aggregation methods of step $C$,


- <span style="color: #E6180A;">$^1$</span> is the kernel-based consensual aggregation for regression by [Has (2021)](https://hal.archives-ouvertes.fr/hal-02884333v5). The source codes of these functions are available in [KernelAggReg.R](https://github.com/hassothea/AggregationMethods/blob/main/KernelAggReg.R), and the documentation is available [here](https://hassothea.github.io/files/KernelAggReg/KernelAggReg.html).
- <span style="color: #0832CD">$^2$</span> is the aggregation using input-output trade-off by [Fischer and Mougeot (2019)](https://www.sciencedirect.com/science/article/pii/S0378375818302349). The source codes of these functions are available in [MixCobra.R](https://github.com/hassothea/AggregationMethods/blob/main/MixCobraReg.R), and the documentation is available on [here](https://hassothea.github.io/files/KernelAggReg/MixCobraReg.html).

----


```{r}
KFCreg = function(train_input,
                  train_response,
                  test_input,
                  test_response = NULL,
                  n_cv = 5,
                  parallel = TRUE,
                  inv_sigma = sqrt(.5),
                  alp = 2,
                  K_step = stepK(splits = .5),
                  F_step = stepF(),
                  C_step = stepC(),
                  setGradParamAgg = setGradParameter(),
                  setGridParamAgg = setGridParameter(),
                  setGradParamMix = setGradParameter_Mix(),
                  setGridParamMix = setGridParameter_Mix()){
  start_time <- Sys.time()
  lookup_div_names <- c("euclidean",
                         "gkl",
                         "logistic",
                         "itakura",
                         "polynomial")
  div_ <- K_step$div
  ### K step: Kmeans clustering with BDs
  if (is.null(K_step$div)){
    divergences <- lookup_div_names
    warning("No divergence provided! All of them are used!")
  }
  else{
    divergences <- K_step$div %>% 
      map_chr(.f = ~ match.arg(arg = .x, 
                               choices = lookup_div_names))
  }
  div_list <- divergences %>% 
    map(.f = (\(x) if(x != "polynomial") return(x) else return(rep("polynomial", length(K_step$deg))))) %>%
    unlist
  deg_list <- rep(NA, length(div_))
  deg_list[div_list == "polynomial"] <- K_step$deg
  div_names <- map2_chr(.x = div_list,
                        .y = deg_list,
                        .f = (\(x, y) if(is.na(y)) return(x) else return(paste0(x,y))))
  ### Step K: Kmeans clustering with Bregman divergences
  dm <- dim(train_input)
  id_shuffle <- vector(length = dm[1])
  n_train <- floor(K_step$splits * dm[1])
  id_shuffle[sample(dm[1], n_train)] <- TRUE
  if(parallel){
    numCores <- parallel::detectCores()
    doParallel::registerDoParallel(numCores) # use multicore, set to the number of our cores
    kmean_ <- foreach(i=1:length(div_names)) %dopar% {
      devtools::source_url("https://raw.githubusercontent.com/hassothea/KFC-Procedure/master/kmeanBD.R")
      kmeansBD(train_input = train_input,
               K = K_step$K,
               div = div_list[i],
               n_start = K_step$n_start,
               maxIter = K_step$maxIter,
               deg = deg_list[i],
               scale_input = K_step$scale_input,
               splits = K_step$splits,
               epsilon = K_step$epsilon,
               center_ = K_step$center_,
               scale_ = K_step$scale_,
               id_shuffle = id_shuffle)
    }
    doParallel::stopImplicitCluster()
  } else{
    kmean_ <- map2(.x = div_list,
                   .y = deg_list,
                   .f = ~ kmeansBD(train_input = train_input,
                                   K = K_step$K,
                                   div = .x,
                                   n_start = K_step$n_start,
                                   maxIter = K_step$maxIter,
                                   deg = .y,
                                   scale_input = K_step$scale_input,
                                   splits = K_step$splits,
                                   epsilon = K_step$epsilon,
                                   center_ = K_step$center_,
                                   scale_ = K_step$scale_,
                                   id_shuffle = id_shuffle))
  }
  names(kmean_) <- div_names
  ### F step: Fitting the corresponding model on each observed cluster
  model_ <- div_names %>%
    map(.f = ~ fitLocalModels(kmean_[[.x]],
                              train_response = train_response,
                              model = F_step$model,
                              formula = F_step$formula))
  names(model_) <- div_names
  pred_combine <- model_ %>%
    map_dfc(.f = ~ .x$data_remain$fit)
  y_remain <- train_response[!id_shuffle]
  pred_test <- div_names %>%
    map_dfc(.f = ~ localPredict(model_[[.x]],
                                test_input))
  names(pred_test) <- names(pred_combine) <- div_names
  # C step: Consensual regression aggregation method with kernel-based COBRA
  list_method_agg <- list(mixcobra = function(pred){MixCobraReg(train_input = train_input[!id_shuffle,],
                                                                train_response = y_remain,
                                                                test_input = test_input,
                                                                train_predictions = pred,
                                                                test_predictions = pred_test,
                                                                test_response = test_response,
                                                                scale_input = K_step$scale_input,
                                                                scale_machine = C_step$scale_features,
                                                                n_cv = C_step$n_cv,
                                                                inv_sigma = inv_sigma,
                                                                alp = alp,
                                                                kernels = C_step$kernels,
                                                                optimizeMethod = C_step$opt_methods,
                                                                setGradParam = setGradParamMix,
                                                                setGridParam = setGridParamMix)},
                          cobra = function(pred){kernelAggReg(train_design = pred,
                                                              train_response = y_remain,
                                                              test_design = pred_test,
                                                              test_response = test_response,
                                                              scale_input = K_step$scale_input,
                                                              scale_machine = C_step$scale_features,
                                                              build_machine = FALSE,
                                                              machines = NULL,
                                                              n_cv = C_step$n_cv,
                                                              inv_sigma = sqrt(.5),
                                                              alp = 2,
                                                              kernels = C_step$kernels,
                                                              optimizeMethod = C_step$opt_methods,
                                                              setGradParam = setGradParamAgg,
                                                              setGridParam = setGridParamAgg)})
  res <- map(.x = C_step$method,
             .f = ~ list_method_agg[[.x]](pred_combine))
  list_agg_methods <- list(cobra = "cob",
                           mixcobra = "mix")
  names(res) <- C_step$method
  ext_fun <- function(L, nam){
    tab <- L$fitted_aggregate
    names(tab) <- paste0(names(tab), "_", nam)
    return(tab)
  }
  pred_fin <- C_step$method %>%
    map_dfc(.f = ~ ext_fun(res[[.x]], list_agg_methods[[.x]]))
  time.taken <- Sys.time() - start_time
  ### To finish
  if(is.null(test_response)){
    return(list(
    predict_final = pred_fin,
    predict_local = pred_test,
    agg_method = res,
    running_time = time.taken
  ))
  } else{
    error <- cbind(pred_test, pred_fin) %>%
      dplyr::mutate(y_test = test_response) %>%
      dplyr::summarise_all(.funs = ~ (. - y_test)) %>%
      dplyr::select(-y_test) %>%
      dplyr::summarise_all(.funs = ~ mean(.^2))
    return(list(
      predict_final = pred_fin,
      predict_local = pred_test,
      agg_method = res,
      mse = error,
      running_time = time.taken
  ))
  }
}
```

> **Example.4**: A complete KFC procedure is implemented on the same Abalone data, using $6$ BDs `"euclidean"`, `"itakura"`, `"gkl"`, `"logistic"` and `"polynomial"` (of degree $3$ and $6$). Both aggregation methods are used in the step $C$. Two kernel functions are used for each aggregation method: `"gaussian"` (with gradient descent algorithm) and `"epanechnikov"` (with grid search algorithm).

```{r}
train1 <- logical(n)
train1[sample(n,  floor(n*0.8))] <- TRUE
kfc1 <- KFCreg(train_input = df[train1,2:ncol(df)],
                train_response = df$Rings[train1],
                test_input = df[!train1,2:ncol(df)],
                K_step = stepK(K = 3,
                               scale_input = TRUE,
                               div = c("eucl", "ita", "gkl", "log" ,"poly"),
                               deg = c(3, 6),
                               splits = .5),
                C_step = stepC(method = c("cobra", "mixcobra"),
                               opt_methods = c("grad", "grid"),
                               kernels = c("gaussian", "gaussian"),
                               scale_features = FALSE),
                setGradParamAgg = setGradParameter(rate = 1),
                setGridParamAgg = setGridParameter(min_val = .00001,
                                                   max_val = 10,
                                                   n_val = 100),
                setGradParamMix = setGradParameter_Mix(rate = c(1,1)),
                setGridParamMix = setGridParameter_Mix(min_alpha = 1e-10,
                                                       max_alpha = 5,
                                                       min_beta = 1e-10,
                                                       max_beta = 10,
                                                       n_alpha = 20,
                                                       n_beta = 20))
```

> The mean square errors evaluated on $20\%$-testing data of the above computation are reported below.

```{r}
rf1 <- randomForest::randomForest(Rings ~ ., data = df[train1,2:ncol(df)])
kfc1$predict_final %>%
  mutate(rf = predict(rf1, newdata = df[!train1,2:ncol(df)])) %>%
  sweep(MARGIN = 1, STATS = df$Rings[!train1], FUN = "-") %>%
  .^2  %>%
  colMeans
```

---

> <span style="color: #1FAAE3;">&#128214; Read also [KernelAggReg](\files\KernelAggReg.html) and [MixCobraReg](\files\MixCobraReg.html)</span>.

---

- [Has et al. (2021)](https://www.tandfonline.com/eprint/YKGS8GTKDBKYFXEGFWSB/full?target=10.1080/00949655.2021.1891539)
- [Fischer and Mougeot (2019)](https://www.sciencedirect.com/science/article/pii/S0378375818302349)
- [Has (2021)](https://hal.archives-ouvertes.fr/hal-02884333v5)
- [Biau et al. (2016)](https://www.sciencedirect.com/science/article/pii/S0047259X15000950)
- ...
- [dplyr videos](https://www.youtube.com/hashtag/dplyr) `r fontawesome::fa("video")`
- [ggplot2 video tutorial](https://www.youtube.com/hashtag/ggplot2) `r fontawesome::fa("video")`
- [R for data science](https://r4ds.had.co.nz/)

---